-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds chapter on security #366
Conversation
Even without using `repeat` or `for`, a [Billion laughs attack](https://en.wikipedia.org/wiki/Billion_laughs_attack) | ||
could exist for any data format with macro expansion, and it is certainly possible with Ion 1.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, we've discussed this attack a bunch of times but I didn't know it had an established name.
This is unlikely to be a concern in practice because TDL is not arbitrary code. | ||
TDL is intentionally not Turing complete, to make it impossible to perform arbitrary computation. | ||
It also has a very limited domain—it can only transform/produce Ion data model values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have a brief section on the intentional limitations built into TDL at the top of this section. Stating early on that it cannot recurse, has no forward references, etc would help clarify the reader's mental model going into the discussion of actual vulnerabilities.
## Arbitrary-sized values | ||
|
||
The Ion specification places no limits on the size of Ion values, so an attacker could send a sufficiently large value, | ||
it could consume enough system resources to disrupt the application reading the value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token bucket solution covers a lot of outsized-expansion issues, but it doesn't cover macros where (by using built-in operations like make_string
) an enormous single value could be produced. We may wish to either restrict the number of input arguments or put a size limit on make_string
, make_blob
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that most programming languages already have a limit on the length of strings, byte arrays, lists/vecs, etc. Do you think we need more than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're talking about two different things. I'm describing a flavor of the small-input-big-output problem that doesn't generate lots of values, it generates one gigantic value.
(.make_string (.repeat 1_000_000_000 "lol"))
The token bucket/fuel approach doesn't handle this case because:
repeat
is nested inside of an e-expression, so it isn't subject to the token limitmake_string
"only" produces one value, so it's within the token limit.
It makes sense for us to offer limits on (for example) the number of arguments that can be passed to make_string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeat is nested inside of an e-expression, so it isn't subject to the token limit
Re-reading that section of the doc, I may have misunderstood this part. If repeat
is also pulling from the same token bucket as the parent e-expression, this might be ok as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is my intention—that every value that is produced at any level in a macro evaluator would count, and perhaps that any macro that evaluates to nothing also uses one token to produce nothing.
I don't know how prescriptive we need to be. The point is to set a limit on the expansion to prevent expansion bombs, so I don't think Ion 1.1 libraries need to implement this in exactly the same way because we don't necessarily need to have a precise limit. (After all, if your data looks like an expansion bomb, then you might need to rethink your data.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good read.
Although Ion 1.1 does not specify a maximum size for symbol tables or macro tables, Ion implementations _MAY_ impose | ||
upper bounds on the size of symbol tables, macro tables, module bindings, and any other direct or indirect component of | ||
the encoding context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest that an implementation may also allow the user to configure size limits for any kind of value, including user values.
Issue #, if available:
None
Description of changes:
Adds a chapter describing various security considerations for Ion 1.1
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.